Back

The American Journal of Human Genetics

Elsevier BV

Preprints posted in the last 30 days, ranked by how well they match The American Journal of Human Genetics's content profile, based on 206 papers previously published here. The average preprint has a 0.20% match score for this journal, so anything above that is already an above-average fit.

1
Pitfalls in estimating and interpreting the contribution of ultra-rare genetic variants to the heritability of complex traits

Wang, H.; Wainschtein, P.; Sidorenko, J.; Fikere, M.; Zhang, Y.; Kemper, K. E.; Zheng, Z.; Hivert, V.; Zeng, J.; Goddard, M. E.; Visscher, P. M.; Yengo, L.

2026-04-07 genetic and genomic medicine 10.64898/2026.04.06.26350278 medRxiv
Top 0.1%
53.5%
Show abstract

Assessing the contribution of ultra-rare variants (minor allele frequency <0.01%) to the heritability of complex traits remains challenging due to limited understanding of potential biases. Here, we focus on singletons (that is, variants observed only once in the study sample), the most abundant class of ultra-rare variants, to showcase various confounders of heritability estimates and underline pitfalls in their interpretation. We show through theory, simulations, and analysis of 5,330,210 exome-sequenced singletons in 305,813 unrelated European-ancestry individuals in the UK Biobank that (i) population stratification induces both upward and downward biases in singleton-based heritability estimates (), (ii) estimates capture non-additive genetic effects, and (iii) asymptotic standard errors of estimates from likelihood-based procedures are generally mis-calibrated when traits are not normally distributed. We further showcase these biases in real-data analyses of 22 quantitative phenotypes and report, after accounting for these pitfalls, significant estimate for number of children (3.4%), peak expiratory flow (1.9%), red blood cell count (2.5%), white blood cell count (1.9%) and heel bone mineral density (2.4%). Overall, our study provides recommendations for robust inference of heritability from ultra rare variants and underscores that reliable estimates for ordinal and binary traits will require far larger sample sizes and improved methods, given that confounding in these traits remains difficult to detect and correct

2
Widespread genetic effect heterogeneity impacts bias and power in nonlinear Mendelian randomization

Wang, J.; Morrison, J.

2026-04-20 epidemiology 10.64898/2026.04.17.26351133 medRxiv
Top 0.1%
28.3%
Show abstract

1Mendelian randomization (MR) uses genetic variants as instrumental variables to infer causal relationships between complex traits. Standard MR can be used to estimate an average causal effect at the population level, and typically assumes a linear exposure-outcome relationship. Recently, several methods for estimating nonlinear effects have been developed. However, many have been found to produce spurious empirical findings when subjected to negative control analyses. We propose that this poor performance may be attributable to heterogeneity in variant-exposure associations. We demonstrate that heterogeneous genetic effects on exposure lead to biased estimates, poor coverage, and inflated type I error in control function and stratification-based methods. In contrast, two-stage least squares (TSLS) methods are robust to such heterogeneity, but suffer from low precision and low power in some circumstances. We show that a statistical test for heterogeneity can be used to guide the choice of nonlinear MR methods. Using UK Biobank data, we reassess the causal effects of BMI, vitamin D, and alcohol consumption on blood pressure, lipid, C-reactive protein, and age (negative control). We find strong evidence of heterogeneity for all three exposures, and also recapitulate previous results that control function and stratification-based methods are prone to false positives. Finally, using nonparametric TSLS, we identify evidence of nonlinear causal effects of BMI on HDL cholesterol, triglycerides, and C-reactive protein; however, specific estimates of the shape of these relationships are imprecise. Altogether, our results suggest that common nonlinear MR methods are unreliable in the presence of realistic levels of heterogeneity, and that more methodological development is required before practically useful nonlinear MR is feasible.

3
De novo EHMT2 variants cause an autosomal dominant EHMT2-related Kleefstra syndrome via loss of G9a methyltransferase activity.

Hnizda, A.; Martinez-Delgado, B.; Sanchez-Ponce, D.; Alonso, J.; Amiel, J.; Attie-Bitach, T.; Bada-Navarro, A.; Baladron, B.; Bermejo-Sanchez, E.; Brinsa, V.; Bukova, I.; Cazorla-Calleja, R.; Cervenkova, S.; Chow, S.; Dusek, P.; Fedosieieva, O.; Fernandez-Prieto, M.; Ghosh, S.; Gomez-Mariano, G.; Gregorova, A.; Hamilton, M. J.; Hartmannova, H.; Hernandez-San Miguel, E.; Herrero-Matesanz, M.; Hodanova, K.; Kadek, A.; Kerkhof, J.; Kleefstra, T.; Lacombe, D.; Levy, M. A.; Lopez-Martin, E.; Lyse, R.; Man, P.; Marin-Reina, P.; Macnamara, E. F.; McConkey, H.; Melenovska, P.; Mielu, L. M.; Moore, D.;

2026-04-20 genetics 10.1101/2025.09.25.678439 medRxiv
Top 0.1%
23.6%
Show abstract

EHMT1 and EHMT2 genes encode human euchromatin histone lysine methyltransferase 1 and 2 (EHMT1 alias GLP; EHMT2 alias G9a) that form heteromeric GLP/G9a complexes with essential roles in epigenetic regulation of gene expression. While EHMT1 haploinsufficiency has been established as the cause of Kleefstra syndrome 1, the pathogenesis of G9a dysfunction in human disease remains largely unknown. We identified seven de novo EHMT2 variants in patients with clinical presentation, episignatures, histone modifications and transcriptomic profiles similar to those of Kleefstra syndrome 1. In vitro studies revealed that these variants encode for structurally stable G9a proteins that are catalytically incompetent due to aberrant interactions either with histone H3 tail or with S-adenosylmethionine. Heterozygous mice carrying a patient-derived variant exhibited growth retardation, facial/skull dysmorphia and aberrant behavior. Here we report pathogenic EHMT2 variants that likely exert dominant-negative effect on GLP/G9a complexes and thus genocopy the EHMT1 haploinsufficiency via a distinct molecular mechanism, defining an autosomal dominant EHMT2-related Kleefstra syndrome.

4
Functionality-Informed Fine-Mapping Dissects Common Variant Contributions to Coronary Artery Disease and Identifies Causal Variants and Pathways

Jacobsen, J. T.; Moller, P. L.; Rohde, P. D.

2026-04-02 genetic and genomic medicine 10.64898/2026.04.01.26349823 medRxiv
Top 0.1%
23.2%
Show abstract

Genomics offer a powerful approach to identify causal mechanisms underlying coronary artery disease (CAD) risk, with implications for pathogenesis, personalized prevention strategies, and therapeutic target discovery. Functionality-informed genome-wide fine mapping was performed using the Bayesian framework SBayesRC to estimate genetic contributions of 6.9 million common variants, based on GWAS summary statistics from over one million individuals of European ancestry. Causal candidate genes were prioritized in a 5kB flanking window within high-confidence local credible sets (LCSs). Their downstream biological influence was analyzed using protein-protein interaction networks and pathway enrichment analyses across three complimentary dimensions: molecular, cellular, and disease level. Genetic modeling captured the highly polygenic architecture of CAD, estimating on average 34,000 variants to contribute to CAD risk, explaining 3.8% of total phenotypic variance. 36 high-confidence variants (PIP > 0.9) collectively explained 13.6% of genetic variance, while most variants demonstrated small individual effects but with substantial collective contributions. 17,150 variants were prioritized within 581 high-confidence LCSs, of which 195 were annotated to genes and 170 were implicated in downstream pathway analyses. The three most influential variants were mapped to PHACTR1, APOE, and LPL, explaining 2.49%, 1.59%, and 1.46% of genetic variance respectively. Pathway analyses revealed that genetic risk in CAD is driven by dysregulation of three interlinked biological processes: 1) lipoprotein function and cholesterol metabolism, 2) vascular homeostasis, and 3) cellular stress responses and inflammation. These findings advance the causal understanding of CAD pathogenesis, supporting the transition from association-based to functionality-informed genomic approaches in cardiovascular genetics.

5
A drug repurposing screen reveals dopamine signaling as a candidate therapeutic pathway for PIGA-CDG

Aziz, M. C.; Wilson, J.; Chow, C. Y.

2026-04-18 genetics 10.64898/2026.04.17.719256 medRxiv
Top 0.1%
23.0%
Show abstract

PIGA-CDG is a congenital disorder of glycosylation caused by pathogenic partial loss-of-function variants in the PIGA gene. PIGA encodes an enzyme responsible for the catalytic transfer of N-acetylglucosamine to phosphatidylinositol during the first step of glycosylphosphatidylinositol anchor biosynthesis. Loss of this enzyme has a widespread phenotypic impact, but primarily results in neurological symptoms including seizures, intellectual disability, and developmental delay. Currently, treatments are limited and focus on symptom management. We developed an eye model of PIGA-CDG that has a reduced eye size. We screened a library of 98% 1,520 FDA/EMA-approved compounds to find drugs that improved the small eye phenotype. This screen revealed numerous drugs that improved eye size, including those that targeted dopamine signaling and cyclooxygenases. Using pharmacological and genetic approaches, we show that modulating dopamine signaling improves the eye size. Genetic inhibition of dopamine 2 receptor signaling and dopamine reuptake improve both the eye model and neurologically relevant PIGA-CDG phenotypes, including seizures and locomotor deficits. We also pharmacologically and genetically validate cyclooxygenase targeting drugs in the eye model. These findings reveal novel biology underlying PIGA-CDG and point towards candidate therapeutic approaches. AUTHOR SUMMARYPIGA-CDG is a rare neurodevelopmental disorder caused by pathogenic variants in the gene PIGA. Patients primarily display neurological symptoms, including seizures, developmental delay, and intellectual disability. Fewer than 100 patients have been identified, and treatment strategies are limited. In the context of rare diseases, de novo drug development is difficult due to the high cost, lengthy development times, and often too small of a patient population to conduct a clinical trial. Our lab leverages drug repurposing screening to circumvent many of the hurdles associated with de novo drug development. Here, we develop and screen FDA- or EMA-approved compounds on a Drosophila model of PIGA-CDG, uncovering novel biology underlying PIGA-associated pathophysiology. We use pharmacological and genetic tools to demonstrate that modifying dopamine signaling and abundance, as well as cyclooxygenase-mediated pathways, contribute to PIGA associated phenotypes. This work highlights promising therapeutic targets for PIGA-CDG.

6
Calibration of in-frame indel variant effect predictors for clinical variant classification

Abderrazzaq, H.; Singh, M.; Babb, L.; Bergquist, T.; Brenner, S. E.; Pejaver, V.; O'Donnell-Luria, A.; Radivojac, P.; ClinGen Computational Working Group, ; ClinGen Variant Classification Working Group,

2026-04-18 bioinformatics 10.64898/2026.04.15.718599 medRxiv
Top 0.1%
22.8%
Show abstract

Insertions and deletions (indels) represent a substantial source of genetic variation in humans and are associated with a diverse array of functional consequences. Despite their prevalence and clinical importance, indels, particularly short in-frame indels, remain critically understudied compared to single nucleotide variants and are challenging to interpret clinically. While many computational predictors for missense variants have been rigorously evaluated and calibrated for clinical use, the clinical utility of tools for in-frame indels remains uncertain. To address this gap, we have calibrated in-frame indel prediction tools for clinical variant classification. We constructed a high-confidence dataset of in-frame indel variants ([&le;] 50bp) from clinical and population databases and estimated the prior probability of pathogenicity of a rare in-frame indel observed in a disease-associated gene, and of an insertion and deletion separately. Using a previously developed statistical framework based on local posterior probabilities, we then established score thresholds for eight computational tools, corresponding to distinct evidence levels for pathogenic and benign classification according to ACMG/AMP guidelines. All in-frame indel predictors evaluated here reached multiple evidence levels of pathogenicity and/or benignity, demonstrating measurable clinical value. However, these models consistently exhibited lower performance levels compared to missense predictors, highlighting the need for improved computational approaches for indel classification.

7
SIEVE: Locus-Anchored Drug Prioritization for Complex Disorders

Strobl, E. V.

2026-04-17 pharmacology and therapeutics 10.64898/2026.04.15.26350958 medRxiv
Top 0.2%
18.9%
Show abstract

Motivation: Complex disorders arise from multiple genetic mechanisms, but most drug-prioritization methods treat each disorder as a single phenotype and therefore miss locus-specific therapeutic opportunities. Results: We present SIEVE, a framework that decomposes complex disorders into genetically localized subphenotypes and links GWAS summary statistics, reference expression, and perturbational transcriptional profiles to prioritize compounds that target locus-anchored disease mechanisms. SIEVE also constructs genetically calibrated mechanism vectors, projects away nonspecific expression programs using negative anchors, and aggregates evidence across cell lines, doses, and time points to produce robust drug rankings. Across simulations and analyses of real data, SIEVE improves compound prioritization relative to existing methods and shows that subphenotype-aware, genetics-guided modeling can sharpen therapeutic discovery in heterogeneous disorders. Availability and Implementation: R implementation: github.com/ericstrobl/SIEVE.

8
Deriving LD-adjusted GWAS summary statistics through linkage disequilibrium deconvolution

Nouira, A.; Favre Moiron, M.; Tournaire, M.; Verbanck, M.

2026-04-11 genetic and genomic medicine 10.64898/2026.04.10.26350574 medRxiv
Top 0.2%
18.5%
Show abstract

Genome-wide association studies (GWAS) have identified numerous genetic variants associated with complex traits. However, linkage disequilibrium (LD) confounds these associations, leading to false positives where non-causal variants appear associated because they are correlated with nearby causal variants. This is particularly the case in highly polygenic traits where the genome can be saturated in causal variants. To address this issue, we propose LDeconv a method based on truncated singular value decomposition (SVD) that adjust GWAS summary statistics without requiring individual-level genotype data. This approach accounts for LD structure, isolates causal variants in high-LD regions, and improve the reliability of effect size estimates. We assess its performance through simulations across various LD scenarios, conduct extensive sensitivity analyses, and apply them to real GWAS data from the UK Biobank. Our results demonstrate that LDeconv effectively reduces false discoveries while preserving true associations, offering a robust framework for post-GWAS analysis.

9
PALM3 and hearing loss: a potential dual diagnosis interfering with novel gene discovery

Najarzadeh Torbati, P.; Hallbrucker, L.; Hofrichter, M. A. H.; Owrang, D.; Setzke, J.; Kilimann, M. W.; Hemmatpour, A.; Rajati, M.; Ghayoor Karimiani, E.; Haaf, T.; Vogl, C.; Vona, B.

2026-04-21 genetic and genomic medicine 10.64898/2026.04.20.26351093 medRxiv
Top 0.2%
18.4%
Show abstract

Hereditary hearing loss is highly genetically heterogeneous, with emerging overlap between genes implicated in early-onset and age-related hearing loss. We report a consanguineous family with autosomal recessive, non-syndromic hearing loss in which the proband harbors a homozygous splice-site variant in PALM3 (NM_001145028.2:c.314+1G>A) and a homozygous missense variant in OTOA. A minigene assay for the PALM3 variant demonstrated aberrant splicing with exon skipping, resulting in a frameshift and a large inframe deletion, both consistent with loss of function and impacting all known transcripts. While the organ of Corti from 12-month-old heterozygous Palm3 mice showed preserved overall architecture, published Palm3 knockout mice exhibit auditory dysfunction, supporting an auditory phenotype with loss of function. Although a dual molecular diagnosis cannot be excluded, the combined genetic, functional, and comparative data support PALM3 as a strong candidate gene for autosomal recessive hearing loss.

10
Identifying Inheritance Patterns of Allelic Imbalance, using Integrative Modeling and Bayesian Inference

Hoyt, S. H.; Reddy, T. E.; Gordan, R.; Allen, A. S.; Majoros, W. H.

2026-03-31 bioinformatics 10.64898/2026.03.28.714974 medRxiv
Top 0.2%
18.2%
Show abstract

Interpreting the effects of novel mutations on phenotypic traits remains challenging, particularly for cis-regulatory variants. For rare variants, individuals typically possess at most one affected copy of the causal allele, leading to allelic imbalance, and thus the ability to infer inheritance of allelic imbalance can inform genetic studies of phenotypic traits. While many methods for detection of allele-specific expression (ASE) exist, they largely focus on ASE in one individual. We show that performing joint inference across multiple individuals in a trio allows for simultaneously improving estimates of ASE and identifying its likely mode of inheritance. Our Bayesian approach has the benefit of being able to (1) aggregate information across individuals so as to improve statistical power, (2) estimate uncertainty in estimates, and (3) rank modes of inheritance by posterior probability. We demonstrate that this model is also applicable to other forms of imbalance such as allele-specific chromatin accessibility. Applying the model to ATAC-seq and RNA-seq from several trios, we uncover examples in which ASE can be linked to imbalance in chromatin state of cis-regulatory elements and to potential causal variants. As the cost of sequencing continues to decrease, we expect that powerful methodologies such as the one presented here will promote more routine collection of samples from related individuals and improve our understanding of genetic effects on gene regulation and their contribution to phenotypic traits.

11
Beyond Exons: Linking Noncoding Heritability and Polygenicity across Complex Human Traits and Disorders

Fuhrer, J.; Shadrin, A. A.; Hughes, T.; Parker, N.; Hindley, G.; Frei, E.; Nguyen, D.; Smeland, O. B.; Djurovic, S.; Andreassen, O.; Dale, A.; Frei, O.

2026-04-03 genetics 10.64898/2026.04.01.715766 medRxiv
Top 0.2%
18.1%
Show abstract

The genetic architecture of complex traits spans a continuum of polygenicity, yet it remains unclear how differences in polygenicity relate to the functional localization of SNP heritability across the genome. We use a MiXeR-based framework to partition heritability across exonic, intronic, and intergenic regions for 34 traits and introduce a likelihood-based annotation contribution score that quantifies annotation-specific impact on heritability. Exons explain a minority of heritability, and their contribution decreases with increasing polygenicity, from an average of 22% in less polygenic somatic diseases and biomarkers to 13% in highly polygenic psychiatric and cognitive phenotypes. Intergenic fractions show the opposite trend, whereas intronic fractions remain relatively stable. Analysis of a broader set of functional annotations reveals systematic differences along the polygenicity axis: highly polygenic traits show stronger contributions from comparative genomics and variant-effect scores, whereas less polygenic traits show stronger contributions in promoter, transcription, and chromatin annotations. Together, these results indicate that the functional partitioning of heritability systematically varies with polygenicity, pointing to a shift from gene-proximal regulatory architectures to architectures shaped by numerous dispersed regulatory effects as a key determinant of differences in polygenicity across traits.

12
Clinical evidence yield as a framework for evaluating computational predictors and multiplexed assays of variant effect

Shang, Y.; Badonyi, M.; Marsh, J. A.

2026-03-30 bioinformatics 10.64898/2026.03.27.714777 medRxiv
Top 0.2%
17.2%
Show abstract

Interpreting the clinical significance of missense variants of uncertain significance (VUS) remains a major challenge in clinical genetics. Although computational variant effect predictors (VEPs) and multiplexed assays of variant effect (MAVEs) can generate large-scale functional scores, their value is typically assessed using discrimination metrics such as AUROC rather than by the strength of evidence they provide under ACMG/AMP guidelines. Here, we introduce mean evidence strength (MES), a quantitative metric that summarises the pathogenic and benign evidence assigned across missense variants following gene-level Bayesian calibration. Using the acmgscaler framework, we calibrated 12 population-free VEPs across 367 disease genes and analysed 15 MAVE datasets with sufficient clinical data. MES revealed important discrepancies with AUROC, including cases where methods with similar discrimination differed substantially in evidence yield. MAVEs achieved high average MES despite lower AUROC, while several VEPs showed strong discrimination but more limited calibrated evidence. Among predictors, CPT-1 achieved the highest MES and provided moderate or stronger evidence for the largest fraction of ClinVar VUS. MES therefore provides a practical framework for evaluating computational and experimental variant effect datasets in terms of calibrated clinical evidence yield.

13
Alignment-Free Microhaplotype Genotyping for GT-seq (Genotyping-in-Thousands by Sequencing) Using a Diploid Abundance Model

Campbell, N. R.; Campbell, A. R.; Blair, S. K.; Finger, A. J.

2026-04-03 genetics 10.64898/2026.04.01.715880 medRxiv
Top 0.3%
14.4%
Show abstract

GT-seq (Genotyping-in-Thousands by Sequencing) is widely used for high-throughput amplicon genotyping, but most analytical pipelines focus on single SNPs or rely on alignment-based variant calling. Here we present an alignment-free approach for microhaplotype genotyping that leverages the high read depth and low error rates typical of paired-end Illumina and Element sequencing. The pipeline first identifies primer-bounded reads and resolves paired-end sequences into complete amplicon sequences. Within each sample and locus, unique sequences are ranked by read abundance and the top one or two sequences are retained as candidate diploid alleles. These alleles are aggregated across samples to construct a catalog of unique haplotypes for each locus. In a second pass, reads are assigned to catalog haplotypes by exact sequence matching to produce diploid genotypes. Finally, catalog haplotype sequences are positionally compared to identify phased SNP and collapsed indel variation, generating compact microhaplotype representations suitable for population genetic analysis. This approach enables robust, alignment-free microhaplotype inference directly from high-depth amplicon sequencing data.

14
Graph transformer for ancient ancestry inference

Shanks, C.; Bonet, D.; Comajoan Cara, M.; Ioannidis, A. G.

2026-04-07 genetics 10.64898/2026.04.05.714076 medRxiv
Top 0.3%
13.7%
Show abstract

Local ancestry inference classifies segments of DNA in admixed individuals by their originating population. However, as the date of admixture becomes older, these segments become shorter and determining their ancestry becomes increasingly difficult. This limits many existing segment-based methods to relatively recent historical admixture events and more highly diverged populations. The rapidly expanding availability of ancient DNA offers a promising opportunity to use these ancient samples as references for local ancestry inference. A recent approach integrates ancient samples into the ancestral recombination graph (ARG) for local ancestry inference. Here, we introduce recent advances in deep learning for graphs into this ARG framework to create ARGMix, a graph transformer that infers local ancestry using the coalescent trees of the inferred ARG. Our approach employs ancient samples as references in the marginal trees to predict local ancestry. We train ARGMix on data reflecting the well-understood ancient European demography and demonstrate improved accuracy and robustness even under demographic misspecification. We then apply ARGMix to an ARG of ancient and present-day European samples for ancestry-specific analyses, finding evidence of continuity between Otzi the Iceman and present-day individuals from nearby regions.

15
A non-coding variant at 2p24.2 confers susceptibility to non-syndromic cleft lip and palate through LLPS-dependent regulation of MYCN

Wu, Z.; Yuan, Z.; Yang, R.; Huang, Z.; Liu, Y.; Sun, L.; Bian, Z.; He, M.

2026-04-07 genetic and genomic medicine 10.64898/2026.04.07.26350283 medRxiv
Top 0.3%
12.3%
Show abstract

Non-syndromic cleft lip and palate (NSCLP) represents the most prevalent and clinically severe subtype within non-syndromic orofacial cleft (NSOFC), and 2p24.2 is the most significant reported risk locus for NSCLP. However, the causal variant at 2p24.2 and the underlying pathogenic mechanism remain unclear, limiting clinical translation. Here, we defined a 104-kb linkage disequilibrium (LD) block tagged by the lead SNP rs7552 at 2p24.2. Through a two-stage genetic screen within this block, including targeted sequencing and replication involving 2,437 Chinese NSCLP patients and 2,391 unaffected individuals, we identified a common non-coding single-nucleotide polymorphism, rs4263114, at 2p24.2 as the causal variant that confers susceptibility to NSCLP by residing within a previously unrecognized enhancer. Mechanistically, this enhancer physically bridges to the MYCN promoter through distal spatial contact, implicating MYCN as the pathogenic gene at this locus. Specifically, the rs4263114 risk variant reduces the recruitment of FOXP2 to the enhancer and disrupts liquid-liquid phase separation (LLPS)-driven droplet assembly. This biophysical defect impairs MYCN transcriptional activation and subsequently suppresses cranial neural crest cell (cNCC) differentiation. Notably, MYCN expression in cNCCs carrying homozygous risk alleles were partially restored by promoting FOXP2 LLPS. Collectively, our study functionally annotates the 2p24.2 locus and identifies a mechanism by which a non-coding variant disrupts transcription factor phase separation to increase susceptibility to NSCLP, providing a basis for future clinical translation.

16
APOE4 Allele Frequencies Show Dramatic Variation Across Indian Populations

Ramdas, S.; Kahali, B.

2026-04-13 genetic and genomic medicine 10.64898/2026.04.09.26350483 medRxiv
Top 0.4%
12.0%
Show abstract

The APOE {varepsilon}4 allele is the strongest genetic risk factor for Alzheimers Disease. However, its distribution across Indian populations is poorly characterized. We analyze APOE allele frequencies in 9,524 individuals from 83 distinct populations in the GenomeIndia dataset. {varepsilon}4 frequencies show large variation across populations within India, ranging from 2.7% to 36.1%, with a median of 11%. Tribal populations have higher {varepsilon}4 frequencies compared to non-tribal groups, while Tibeto-Burman populations have significantly lower frequencies. One tribal population from the northern coastal highlands has {varepsilon}4 frequency of 0.36, with 59% of individuals being carriers. {varepsilon}4 carrier status correlates significantly with lipid phenotypes including LDL, HDL, total cholesterol, and triglycerides. Collectively, these findings reveal exceptional genetic diversity in Alzheimers Disease risk across India and have important implications for population-specific screening strategies, genetic counseling, and precision medicine approaches to dementia prevention.

17
Incorporating phenotype heterogeneity in disease GWAS improves power while maintaining specificity

Hof, J. J. P.; Ning, C.; Quinn, L.; Speed, D.

2026-03-27 genetic and genomic medicine 10.64898/2026.03.26.26349370 medRxiv
Top 0.4%
10.4%
Show abstract

Common complex diseases are clinically heterogeneous, yet most genome-wide association studies (GWAS) assume cases are genetically homogeneous. This challenge is compounded in large-scale biobanks, which increasingly combine cases ascertained under different recruitment strategies, raising concerns that heterogeneous case definitions may dilute genetic signal. To address this, we developed StratGWAS, a scalable framework that leverages clinical features of heterogeneity to construct a transformed phenotype that better reflects genetic liability within diseases. StratGWAS stratifies cases using secondary phenotypic information such as age of onset, medication burden, or recruitment definition. StratGWAS then estimates genetic covariance between strata, and derives a transformed phenotype that upweights cases with higher inferred genetic liability. Through simulation studies (N = 100k) and application to the UK Biobank (N = 368k), we show that StratGWAS consistently outperformed standard GWAS methods. Applied to 21 UK Biobank traits, StratGWAS upweighted individuals with earlier disease onset and higher medication burden, yielding respectively 17% and 4% more independent genome-wide significant loci than standard case control GWAS. Applied to depression, StratGWAS upweighted individuals with multiple diagnoses, greater psychiatric comorbidity, or higher self reported depressive symptoms, identifying eight additional independent loci compared to case-control GWAS.

18
Charting the cognitive development of children using adult 'polygenic g scores'

Lin, Y.; Plomin, R.

2026-04-05 genetics 10.64898/2025.12.19.695378 medRxiv
Top 0.4%
10.4%
Show abstract

The most highly predictive polygenic scores in the behavioural sciences are for cognitive traits, especially general cognitive ability (g) and educational attainment. We combined polygenic scores derived from genome-wide association studies of adult g and educational attainment to create adult 'polygenic g scores' which we used to chart the course of cognitive development of 10,000 white British children from toddlerhood through early adulthood. We integrated cross-sectional regression, latent growth curve, and confirmatory factor analysis to systematically characterise cognitive development. Polygenic g score showed minimal prediction in toddlerhood, modest prediction in childhood, and substantial prediction by early adulthood accounting for 12% of the variance. Higher polygenic g scores were associated with faster cognitive growth in latent growth models. Prediction was strongest for a cross-time latent cognitive factor (15%) capturing cognitive ability across development. By integrating polygenic prediction directly into a structural equation model framework, we provided a theoretical upper bound of genetic influences on g under minimal measurement error. We also examined the polygenic g score's prediction of educational achievement, behaviour problems, and anthropometric outcomes and found similar developmental increases in prediction for educational achievement. Together, our findings demonstrate that adult polygenic g scores can be a useful tool for charting the development of cognitive traits.

19
Somatic mutation of ELF4 causes autoinflammatory diseases and cell type-specific immune alterations

Zhang, Q.; Lei, Y.; Zhao, X.; Du, H.

2026-04-11 allergy and immunology 10.64898/2026.04.08.26350315 medRxiv
Top 0.4%
10.2%
Show abstract

ELF4 is an ETS family transcription factor involved in immune regulation, and germline loss-of-function mutations in ELF4 have been known as deficiency in ELF4, X-linked (DEX). To date, ELF4-related disease has been exclusively associated with germline mutations. Here, we report a pediatric patient with recurrent mucocutaneous inflammation and periodic fever caused by a somatic truncating mutation in ELF4. By directly comparing ELF4-mutant and wild-type immune cells within the same individual using full-length single-cell RNA sequencing, we identified mutation-associated transcriptional alterations across multiple immune cell types. Pathway analyses revealed cell type-specific immune alterations, characterized by reduced antiviral and interferon-related signaling in NK cells and enhanced inflammatory pathways related to Th17 differentiation and inflammatory bowel disease in CD16 monocytes. This study expands the disease spectrum of ELF4 deficiency by identifying somatic truncation of ELF4 as a genetic mechanism underlying autoinflammatory diseases and biased immune programs.

20
A C. elegans model for functional analysis of ADPKD variants in cilia, extracellular vesicles, and sensory signaling

Wang, J.; Nava Cruz, C.; Walsh, J. D.; desRanleau, E.; Nikonorova, I. A.; Barr, M. M.

2026-04-15 genetics 10.64898/2026.04.14.718433 medRxiv
Top 0.4%
10.1%
Show abstract

Interpreting the pathogenic significance of missense variants in human disease gene candidates remains a major challenge in precision medicine. Autosomal dominant polycystic kidney disease (ADPKD) is the most common genetic cause of kidney failure and caused by mutations in the PKD1 or PKD2 genes that encode polycystin-1 and polycystin-2. Here, we establish C. elegans as a platform for the functional classification of PC2 variants by characterizing PKD-2C180S, the C. elegans ortholog of the likely pathogenic human variant PC2C331S. Using CRISPR/Cas9 endogenous genome editing combined with dual-color fluorescent reporters and super-resolution imaging, we show that PKD-2C180S severely reduces protein stability, abolishes ciliary and extracellular vesicle (EV) localization, and eliminates sensory function comparable to a pkd-2 null allele. In heterozygous animals, PKD-2C180S is recessive and exerts no dominant-negative effect on wild-type PKD-2 trafficking, protein levels, or function, establishing that PKD-2 is haplosufficient in this model. PKD-2C180S also abolishes ciliary and EV localization of the PC1 homolog LOV-1 and reduces LOV-1 cell body levels comparable to pkd-2 null animals, consistent with PC2 functioning as a molecular chaperone for PC1 stability and trafficking. Genetic epistasis experiments show that PKD-2C180S protein levels are unaffected in lov-1 mutants, indicating that the PKD-2C180S mutation acts prior to complex assembly. Quantitative analysis reveals that LOV-1*PKD-2 complexes are more stable at the ciliary membrane and more efficiently packaged into EVs than PKD-2 lacking LOV-1. Together, this work demonstrates that PC2C331S may act recessively via loss of polycystin complex function and establishes a C. elegans pipeline for the mechanistic classification of ADPKD-associated variants.